A brief introduction to using R for high-performance computing

George G. Vega Yon
ggvy.cl
USC Integrative Methods of Analysis for Genetic Epidemiology (IMAGE)
Department of Preventive Medicine

September 13, 2018

Agenda

  1. High-Performance Computing: An overview

  2. Parallel computing in R

  3. Extended examples

High-Performance Computing: An overview

Loosely, from R’s perspective, we can think of HPC in terms of two, maybe three things:

  1. Big data: How to work with data that doesn’t fit your computer

  2. Parallel computing: How to take advantage of multiple core systems

  3. Compiled code: Write your own low-level code (if R doesn’t has it yet…)

(Checkout CRAN Task View on HPC)

Big Data

  • Buy a bigger computer/RAM memory (not the best solution!)

  • Use out-of-memory storage, i.e., don’t load all your data in the RAM. e.g. The bigmemory, data.table, HadoopStreaming R packages

  • Efficient algorithms for big data, e.g.: biglm, biglasso

  • Store it more efficiently, e.g.: Sparse Matrices (take a look at the dgCMatrix objects from the Matrix R package)

Parallel computing

We will be focusing on the Single Instruction stream Multiple Data stream

Some vocabulary for HPC

In raw terms

  • Supercomputer: A single big machine with thousands of cores/gpus.

  • High Performance Computing (HPC): Multiple machines within a single network.

  • High Throughput Computing (HTC): Multiple machines across multiple networks.

You may not have access to a supercomputer, but certainly HPC/HTC clusters are more accesible these days, e.g. AWS provides a service to create HPC clusters at a low cost (allegedly, since nobody understands how pricing works)

GPU vs CPU

  • Why use OpenMP if GPU is suited to compute-intensive operations? Well, mostly because OpenMP is VERY easy to implement (easier than CUDA, which is the easiest way to use GPU).

Let’s think before we start…

When is it a good idea to go HPC?

When is it a good idea?

Ask yourself these questions before jumping into HPC!

Ask yourself these questions before jumping into HPC!

Parallel computing in R

While there are several alternatives (just take a look at the High-Performance Computing Task View), we’ll focus on the following R-packages for explicit parallelism:

  • parallel: R package that provides ‘[s]upport for parallel computation, including random-number generation’.

  • foreach: R package for ‘general iteration over elements’ in parallel fashion.

  • future: ‘[A] lightweight and unified Future API for sequential and parallel processing of R expression via futures.’ (won’t cover here)

Implicit parallelism, on the other hand, are out-of-the-box tools that allow the programmer not to worry about parallelization, e.g. such as gpuR for Matrix manipulation using GPU, tensorflow

And there’s also a more advanced set of options

  • Rcpp + OpenMP: Rcpp is an R package for integrating R with C++, and OpenMP is a library for high-level parallelism for C/C++ and Fortran.

  • A ton of other type of resources, notably the tools for working with batch schedulers such as Slurm, HTCondor, etc.

Parallel workflow

(Usually) We do the following:

  1. Create a PSOCK/FORK (or other) cluster using makePSOCKCluster/makeForkCluster (or makeCluster)

  2. Copy/prepare each R session (if you are using a PSOCK cluster):

    1. Copy objects with clusterExport

    2. Pass expressions with clusterEvalQ

    3. Set a seed

  3. Do your call: parApply, parLapply, etc.

  4. Stop the cluster with clusterStop

Types of clusters: PSOCK

  • Can be created with makePSOCKCluster

  • Creates brand new R Sessions (so nothing is inherited from the master), e.g.

  • Child sessions are connected to the master session via Socket connections

  • Can be created outside of the current computer, i.e. accross multiple computers!

Types of clusters: Fork

  • Fork Cluster makeForkCluster:

  • Uses OS Forking,

  • Copies the current R session locally (so everything is inherited from the master up to that point).

  • Data is only duplicated if it is altered (need to double check when this happens!)

  • Not available on Windows.

  • Other makeCluster: passed to snow (Simple Network of Workstations)

Take a look at the foreach, Rmpi, and future R packages.

Let’s get started!

Ex 1: Parallel RNG with makePSOCKCluster

##               [,1]          [,2]          [,3]          [,4]
## [1,]  0.0861888293 -0.0001633431  5.939143e-04 -3.672845e-04
## [2,] -0.0001633431  0.0853841838  2.390790e-03 -1.462154e-04
## [3,]  0.0005939143  0.0023907904  8.114219e-02 -4.714618e-06
## [4,] -0.0003672845 -0.0001462154 -4.714618e-06  8.467722e-02

Making sure is reproducible

## [1] TRUE

Ex 2: Parallel RNG with makeForkCluster

In the case of makeForkCluster

##               [,1]          [,2]          [,3]          [,4]
## [1,]  0.0861888293 -0.0001633431  5.939143e-04 -3.672845e-04
## [2,] -0.0001633431  0.0853841838  2.390790e-03 -1.462154e-04
## [3,]  0.0005939143  0.0023907904  8.114219e-02 -4.714618e-06
## [4,] -0.0003672845 -0.0001462154 -4.714618e-06  8.467722e-02

Again, we want to make sure this is reproducible

##      [,1] [,2] [,3] [,4]
## [1,]    0    0    0    0
## [2,]    0    0    0    0
## [3,]    0    0    0    0
## [4,]    0    0    0    0

Well, if you are a MacOS/Linux user, there’s a simplier way of doing this…

Ex 3: Parallel RNG with mclapply (Forking on the fly)

In the case of mclapply, the forking (cluster creation) is done on the fly!

##              [,1]        [,2]         [,3]         [,4]
## [1,]  0.085384184 0.002390790  0.006576204 -0.003998278
## [2,]  0.002390790 0.081142190  0.001846963  0.001476244
## [3,]  0.006576204 0.001846963  0.085175347 -0.002807348
## [4,] -0.003998278 0.001476244 -0.002807348  0.082425477

Once more, we want to make sure this is reproducible

##      [,1] [,2] [,3] [,4]
## [1,]    0    0    0    0
## [2,]    0    0    0    0
## [3,]    0    0    0    0
## [4,]    0    0    0    0

Ex 5: RcppArmadillo + OpenMP

## 
##      0 
## 250000
##                     test replications elapsed relative
## 2 dist_par(x, cores = 1)           10   3.146    1.000
## 4 dist_par(x, cores = 4)           10   3.904    1.241
## 3 dist_par(x, cores = 2)           10   4.288    1.363
## 1                dist(x)           10   4.597    1.461

Ex 6: The future

Thanks!

gvegayon
@gvegayon
ggvy.cl

Presentation created with revealjs

See also

For more, checkout the CRAN Task View on HPC

Session info

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] future_1.10.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0       codetools_0.2-15 listenv_0.7.0    revealjs_0.9    
##  [5] digest_0.6.18    rprojroot_1.3-2  backports_1.1.2  magrittr_1.5    
##  [9] evaluate_0.12    icon_0.1.0       highr_0.7        stringi_1.2.4   
## [13] rmarkdown_1.10   tools_3.5.1      stringr_1.3.1    yaml_2.2.0      
## [17] compiler_3.5.1   globals_0.12.4   htmltools_0.3.6  knitr_1.20

Bonus track 1: Simulating \(\pi\)

  • We know that \(\pi = \frac{A}{r^2}\). We approximate it by randomly adding points \(x\) to a square of size 2 centered at the origin.

  • So, we approximate \(\pi\) as \(\Pr\{\|x\| \leq 1\}\times 2^2\)

The R code to do this

Bonus track 1: Simulating \(\pi\) (cont’d)

##       test replications elapsed relative
## 1 parallel            1   0.398    1.000
## 2   serial            1   1.040    2.613
##      par      ser        R 
## 3.141561 3.141247 3.141593

Bonus track 2: HPC with Slurm

  • Suppose that we would like to maximize/minimize a function using an stochastic optimization algorithm, namely, the Artificial Bee Colony algorithm

  • The following R script (01-slurm-abcoptim.R) was designed to work with Slurm (it requires the R package ABCoptim [@ABCoptim])

  • Notice that we are using SLURM_JOB_ID, and SLURM_ARRAY_TASK_ID to save our results (both environment variables created by slurm)

Now you try it!